[[ (book)Bowles_3.3_rocks | Measuring the Performance of Predictive Models

Performance measures for regression problems

mean squared error (MSE)
mean absolute error (MAE)
root MSE (RMSE, which is the square root of MSE)
Listing 3-1: Comparison of MSE, MAE and RMSE—regressionErrorMeasures.py
variance (mean squared deviation from the mean)
standard deviation (square root of variance)
"""For example, if the MSE of the prediction error is roughly the same as the target variance (or the RMSE is roughly the same as target standard deviation), the prediction algorithm is not performing well. You could replace the prediction algorithm with a simple calculation of the mean of the targets and perform as well.
"""The errors in Listing 3-1 have RMSE that’s about half the standard deviation of the targets. That is fairly good performance.
histogram of the error
tail behavior (quantile or decile boundaries)
degree of normality

Classification problems

misclassification error rates
Generally, algorithms for doing classification can present predictions in the form of a probability instead of a hard click versus not-click decision. The algorithms considered in this book all output probabilities ... the data scientist has the option to use 50 percent as a threshold
confusion matrix or contingency table
- confusionMatrix() ... takes the predictions, the corresponding actual values (labels), and a threshold value as input
receiver operating characteristic (ROC)
- The ROC curve plots the true positive rate (abbreviated TPR) versus the false positive rate (FPR).
- JB: geht ROC auch mit missclassification rate?
area under the curve (AUC)
- A perfect classifier has an AUC of 1.0
- random guessing has an AUC of 0.5

training set

test set

[validation set?]

CODE Listing 3-1: Comparison of MSE, MAE and RMSE—regressionErrorMeasures.py Figure 3-9: Confusion matrix example Listing 3-2: Measuring Performance for Classifier Trained on Rocks-Versus-Mines— classifierPerformance_RocksVMines.py Table 3-2: Dependence of Misclassification Error on Decision Threshold Table 3-3: Cost of Mistakes for Different Decision Thresholds Figure 3-10: In-sample ROC for rocks-versus-mines classifier Figure 3-11: Out-of-sample ROC for rocks-versus-mines classifier